doExecute(): RDD[InternalRow]
SparkPlan
— Physical Execution Plan
SparkPlan
is the base QueryPlan for physical operators to build physical execution plan of a structured query (which is also modelled as…a Dataset!).
The SparkPlan contract assumes that concrete physical operators define doExecute method which is executed when the final execute
is called.
Note
|
The final execute is triggered when the QueryExecution (of a Dataset ) is requested for a RDD[InternalRow ].
|
When executed, a SparkPlan
produces RDDs of InternalRow (i.e. RDD[InternalRow]
s).
Caution
|
FIXME SparkPlan is Serializable . Why?
|
Note
|
The naming convention for physical operators in Spark’s source code is to have their names end with the Exec prefix, e.g. DebugExec or LocalTableScanExec.
|
Tip
|
Read InternalRow about the internal binary row format. |
Name | Description |
---|---|
|
|
|
|
|
SparkPlan
has the following final
methods that prepare environment and pass calls on to corresponding methods that constitute SparkPlan Contract:
-
execute
callsdoExecute
-
prepare
callsdoPrepare
-
executeBroadcast
callsdoExecuteBroadcast
Name | Description |
---|---|
waitForSubqueries
Method
Caution
|
FIXME |
prepare
Method
Caution
|
FIXME |
executeCollect
Method
Caution
|
FIXME |
SparkPlan
Contract
SparkPlan
contract requires that concrete physical operators (aka physical plans) define their own custom doExecute
.
Name | Description |
---|---|
Prepares execution |
|
Caution
|
FIXME Why are there two executes? |
Executing Query in Scope (after Preparations) — executeQuery
Final Method
executeQuery[T](query: => T): T
executeQuery
executes query
in a scope (i.e. so that all RDDs created will have the same scope).
Internally, executeQuery
calls prepare and waitForSubqueries before executing query
.
Note
|
executeQuery is executed as part of execute, executeBroadcast and when CodegenSupport produces a Java source code.
|
Computing Query Result As Broadcast Variable — executeBroadcast
Final Method
executeBroadcast[T](): broadcast.Broadcast[T]
executeBroadcast
returns the results of the query as a broadcast variable.
Internally, executeBroadcast
executes doExecuteBroadcast inside executeQuery.
Note
|
executeBroadcast is executed in BroadcastHashJoinExec, BroadcastNestedLoopJoinExec and ReusedExchangeExec .
|
SQLMetric
SQLMetric
is an accumulator that accumulates and produces long metrics values.
There are three known SQLMetrics
:
-
sum
-
size
-
timing
metrics Lookup Table
metrics: Map[String, SQLMetric] = Map.empty
metrics
is a private[sql]
lookup table of supported SQLMetrics by their names.